Example structure of a “good” guide
## Warning: Removed 1 rows containing non-finite values (stat_ydensity).
Per guide: * Per timepoint: t-test for difference in signal between RNP-only and 100 fM conditions per timepoint * Perform FDR correction for number of measurements from start of experiment to timepoint * Return first timepoint for which corrected p-value < 0.05
Structures of the two guides that performed well (rate > 1 above background) but without hairpin structure (NCR_1346, NCR_1351):
Determination of how much of predicted hairpin structure needs to be maintained:
## NCR.id spacer
## 120 NCR_1313 GUUUACCUUGGUAAUCAUCU
## 126 NCR_1319 UCAUUAAAUGGUAGGACAGG
## 137 NCR_1330 GCAAUCAAUGGGCAAGCUUU
## 138 NCR_1331 CUUCUCUGUAGCUAGUUGUA
## 139 NCR_1332 GAGUAAAUCUUCAUAAUUAG
## 142 NCR_1335 AUGGUGUCCAGCAAUACGAA
## 143 NCR_1336 GCCGUCUUUGUUAGCACCAU
## 155 NCR_1348 AUUAGCUCUCAGGUUGUCUA
## 156 NCR_1349 UGGUACGUUAAAAGUUGAUG
## 158 NCR_1351 UGGCUACUUUGAUACAAGGU
## 21685 NCR_1410 UGAAUGUAAAACUGAGGAUCUGAAAACU
## 9671 NCR_1412 UAUAAGCAAUUGUUAUCCAGAAAGGUAC
## 10691 NCR_1417 GAUUGAGAAACCACCUGUCUCCAUUUAU
## structure
## 120 ...((((((((.........))))................))))........
## 126 ...((((((((.........))))................))))........
## 137 .......(((((....(((...((........))..))).))))).......
## 138 ..(((.(((........(((((.........)))))......))).)))...
## 139 ...............(((((((..(((......)))...)))))))......
## 142 ................((..((((((((.....))))))))..)).......
## 143 ..............((((((((((........)))..)))))))........
## 155 (((((.((((......(((.((((............))))))))))))))))
## 156 ...((((((((.........))))........)))).((((((...))))))
## 158 (((.(((((((.........))))........))))))(((((...))))).
## 21685 ((((((.((((.........))))......((.....))........)).))))......
## 9671 ...(((.((((.........))))............(((....))).........)))..
## 10691 ...............(((((((((((......................)))))).)))))
## Warning in cor.test.default(GC_content, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in cor.test.default(GC_content, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in cor.test.default(downstream_U, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in cor.test.default(downstream_U, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in cor.test.default(downstream_unstructured_U, Estimate, method =
## "spearman"): Cannot compute exact p-value with ties
## Warning in cor.test.default(downstream_unstructured_U, Estimate, method =
## "spearman"): Cannot compute exact p-value with ties
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in cor.test.default(gRNA_MFE, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in cor.test.default(gRNA_MFE, Estimate, method = "spearman"): Cannot
## compute exact p-value with ties
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
## Warning in is.na(x): is.na() applied to non-(list or vector) of type 'language'
Model 1: sequence + structure
Model 2: only sequence
Model 3: only structure
Model 4: only sequence (binary)
Model 5: sequence (binary) + structure
Model 6: rate ~ (antitag position 1) * (spacer structure) + (downstream unstructured U)
##
## Call:
## glm(formula = Estimate ~ ., family = "gaussian", data = subset(model6_comparison_data_onehot,
## nchar(spacer) == 20, select = -spacer))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -35.872 -11.505 -1.238 8.439 59.810
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.8168 2.6737 10.404 < 2e-16 ***
## antitag_pos1_A 0.1987 3.1240 0.064 0.949
## antitag_pos1_C 0.3098 3.5574 0.087 0.931
## antitag_pos1_G -19.4347 3.4947 -5.561 9.21e-08 ***
## antitag_pos1_U NA NA NA NA
## downstream_unstructured_U -16.6583 14.2724 -1.167 0.245
## spacer_structure 6.9620 4.9248 1.414 0.159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 280.6118)
##
## Null deviance: 65043 on 191 degrees of freedom
## Residual deviance: 52194 on 186 degrees of freedom
## AIC: 1635.1
##
## Number of Fisher Scoring iterations: 2
##
## Call:
## glm(formula = Estimate ~ ., family = "gaussian", data = subset(model6_comparison_data_onehot,
## select = -spacer))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -37.871 -12.056 -0.602 8.752 56.402
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.798637 2.643318 10.517 < 2e-16 ***
## antitag_pos1_A -0.006136 3.049764 -0.002 0.9984
## antitag_pos1_C 0.608373 3.463683 0.176 0.8608
## antitag_pos1_G -18.332086 3.419921 -5.360 2.3e-07 ***
## antitag_pos1_U NA NA NA NA
## downstream_unstructured_U -13.644894 13.729301 -0.994 0.3215
## spacer_structure 9.614545 4.779510 2.012 0.0456 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 283.4019)
##
## Null deviance: 68755 on 203 degrees of freedom
## Residual deviance: 56114 on 198 degrees of freedom
## AIC: 1738.8
##
## Number of Fisher Scoring iterations: 2
##
## Call:
## glm(formula = (Estimate > 20) ~ ., family = "binomial", data = subset(model6_comparison_data_onehot,
## nchar(spacer) == 20, select = -spacer))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7203 -1.2990 0.7792 0.9761 2.1787
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.597273 0.337095 1.772 0.0764 .
## antitag_pos1_A -0.103502 0.387561 -0.267 0.7894
## antitag_pos1_C -0.007685 0.445336 -0.017 0.9862
## antitag_pos1_G -2.732521 0.599114 -4.561 5.09e-06 ***
## antitag_pos1_U NA NA NA NA
## downstream_unstructured_U -1.403859 1.813101 -0.774 0.4388
## spacer_structure 0.789682 0.680800 1.160 0.2461
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 265.65 on 191 degrees of freedom
## Residual deviance: 226.28 on 186 degrees of freedom
## AIC: 238.28
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = (Estimate > 20) ~ ., family = "binomial", data = subset(model6_comparison_data_onehot,
## select = -spacer))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.830 -1.307 0.745 0.964 2.027
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.595755 0.332701 1.791 0.0733 .
## antitag_pos1_A -0.193418 0.379716 -0.509 0.6105
## antitag_pos1_C 0.001186 0.439055 0.003 0.9978
## antitag_pos1_G -2.443452 0.524733 -4.657 3.22e-06 ***
## antitag_pos1_U NA NA NA NA
## downstream_unstructured_U -0.706516 1.733150 -0.408 0.6835
## spacer_structure 1.087078 0.661398 1.644 0.1003
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 280.84 on 203 degrees of freedom
## Residual deviance: 243.49 on 198 degrees of freedom
## AIC: 255.49
##
## Number of Fisher Scoring iterations: 4
Model 7: reduced features
Model 8: model cis cleavage site
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Top guides:
Bottom guides:
## Warning in eval(substitute(expr), data, enclos = parent.frame()): NAs introduced
## by coercion
## Warning: NAs introduced by coercion
## [1] "mixed model failed: NCR_1320"
## [1] "mixed model failed: NCR_1332"
## [1] "mixed model failed: NCR_1387"
## Warning: Removed 27 rows containing non-finite values (stat_smooth).
## Warning: Removed 27 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_smooth).
gBlock round 2 outlier:
Figure 1A (data): guide design pipeline
Figure 2A: range of observed guide activities
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
Figure 2B: example traces
Figure 2C: viral RNA v. gblock
Figure 3: elastic net regression + anti-tag result
##
## Welch Two Sample t-test
##
## data: subset(guide_rate$Estimate, guide_rate$antitag_pos1 != "G") and subset(guide_rate$Estimate, guide_rate$antitag_pos1 == "G")
## t = 7.3507, df = 67.711, p-value = 1.686e-10
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 15.19083 Inf
## sample estimates:
## mean of x mean of y
## 27.45234 7.80380
##
## Welch Two Sample t-test
##
## data: subset(guide_rate$Estimate, guide_rate$antitag_label == "G") and subset(guide_rate$Estimate, guide_rate$antitag_label %in% c("GU", "GUU"))
## t = 2.257, df = 30.276, p-value = 0.01569
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 1.853323 Inf
## sample estimates:
## mean of x mean of y
## 9.179261 1.712470
Figure 4C: LOD with Cas13-Csm6 tandem assay
Figure 4D: robustness to genetic variants
Suppl. Figure 1A: random forest variable importance
Suppl. Figure 1B: sequence logo
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
Suppl. Figure 2A: GC content
Suppl. Figure 2B: hybridization MFE
Suppl. Figure 2C: cleaveable U in target context
Suppl. Figure 3A: spacer structure
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.
Suppl. Figure 3B: structure of direct repeat
Suppl. Figure 4A: in vivo viral structure
Suppl. Figure 4B: genomic structure vs. rate
Suppl. Figure 5A: multiplex set of 40 vs. primary screen
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
Suppl. Figure 5B: leave-one-out counterscreen
Suppl. Figure 5C: human RNA counterscreen
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
## Warning in sort(as.numeric(guide)): NAs introduced by coercion
Suppl. Figure 6: 32-pool vs. 8-pool w/ forced mismatch
## Warning: Ignoring unknown aesthetics: fill
gblock rates
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).
anti-tag complementarity
interaction btwn anti-tag G and spacer structure
## Warning: Groups with fewer than two data points have been dropped.
## Warning: Groups with fewer than two data points have been dropped.